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SUBSAMPLING FOR GENERAL STATISTICS UNDER LONG 
RANGE DEPENDENCE WITH APPLICATION TO CHANGE 

POINT ANALYSIS 

ANNIKA BETKEN AND MARTIN WENDLER 


Abstract. In the statistical inference for long range dependent time series 
the shape of the limit distribution typically depends on unknown parameters. 
Therefore, we propose to use subsampling. We show the validity of subsam¬ 
pling for general statistics and long range dependent subordinated Gaussian 
processes which satisfy mild regularity conditions. We apply our method to 
a self-normalized change-point test statistic so that we can test for structural 
breaks in long range dependent time series without having to estimate any nui¬ 
sance parameter. The finite sample properties are investigated in a simulation 
study. We analyze three data sets and compare our results to the conclusions 
of other authors. 


1. Introduction 

1.1. Long Range Dependence. While most statistical research is done for inde¬ 
pendent data or short memory time series, in many applications there are also time 
series with long memory in the sense of slowly decaying correlations: in hydrology 
(starting with the work of Hurst [31]), in finance (e.g. Lo [35]), in the analysis of 
network traffic (e.g. Leland, Taqqu, Willinger and Wilson M) and in many other 
fields of research. 

As model of dependent time series we will consider subordinated Gaussian pro¬ 
cesses: Let (^n)neiN be a stationary sequence of centered Gaussian variables with 
Var(£ n ) = 1 and covariance function 7 satisfying 

(1) 7 (k) ~ Cov(£i,£ fc+ i) = k~°L 1 (k) 

for D > 0 and a slowly varying function L T . If D < 1, the spectral density / of 
(Cn)neiN is not continuous, but has a pole at 0. The spectral density has the form 

/O) = M D ~ 1 L f {x) 

for a function Lf which is slowly varying at the origin (see Proposition 1.1.14 in 
Pipiras and Taqqu [33 ] ). 

Furthermore, let G : R —>■ R be a measurable function such that E[G 2 (£i)] < 00 . 
The stochastic process (A„) nS ]N given by 

:= G(£„) 
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is called long range dependent if X^Lo |Cov(Xi, X n+ x)\ = oo, and short range 
dependent if X^Lo |Cov(Xi,X n+ i)| < oo. 

In limit theorems for the partial sum S n = X,X 1 Xi, the normalization and the 
shape of the limit distribution not only depend on the decay of the covariances "f(k) 
as k —> oo, but also on the function G. More precisely, Taqqu [55] and Dobrushin 
and Major 155] independently proved that 

1 " 

L (n) r/2 n H E (* - E M =* H) 9 rZrAV 

if the Hurst parameter H := max{l — is greater than . Here, r denotes 

the Hermite rank of the function G, C(r,H) is a constant, g r is the first non-zero 
coefficient in the expansion of G as a sum of Hermite polynomials and is a 
Hermite process. For more details on Hermite polynomials and limit theorems for 
subordinated Gaussian processes we recommend the book of Pipiras and Taqqu f44j . 
In this case (rD < 1), the process (X n ) ng]N is long range dependent as the covari¬ 
ances are not summable. Note that the limiting random variable C(r, H)Z rt n( 1) is 
Gaussian only if the Hermite rank r = 1. 

If rD = 1, the process (X n ) ng ]N might be short or long range dependent according 
to the slowly varying function L 7 . If rD > 1, the process is short range dependent. 
In this case, the partial sum XXiOX ~ E[Xj]) has (with proper normalization) 
always a Gaussian limit. 

There are other models for long memory processes: Fractionally integrated au¬ 
toregressive moving average processes can show long range dependence, see Granger 
and Joyeux {28;. General linear processes with slowly decaying coefficients were 
studied by Surgailis [55] , 

1.2. Subsampling. For practical applications the parameters D, r and the slowly 
varying function L 1 are unknown and thus the scaling needed in the limit theorems 
and the shape of the asymptotic distribution are not known, either. That makes it 
difficult to use the asymptotic distribution for statistical inference. The situation 
gets even more complicated if one is not interested in partial sums, but in nonlinear 
statistical functionals. For example, [/-statistics can have a limit distribution which 
is a linear combination of random variables related to different Hermite ranks, see 
Beutner and Zahle m- Self-normalized statistics typically converge to quotients 
of two random variables (e.g. McElroy and Politis [42]). The change-point test 
proposed by Berkes, Horvath, Kokoszka and Shao m converges to the supremum 
of a fractional Brownian bridge under the alternative hypothesis. 

To overcome the problem of the unknown shape of the limit distribution and to 
avoid the estimation of nuisance parameters, one would like to use nonparametric 
methods. However, Lahiri [36] has shown that the popular moving block boot¬ 
strap might fail under long range dependence. Another nonparametric approach 
is subsampling (also called sampling window method), first studied by Politis and 
Romano [35], Hall and Jing [50], and Sherman and Carlstein [50] . The idea is the 
following: Let T n = T n (X i,...,X n ) be a series of statistics converging in distri¬ 
bution to a random variable T. However, as we typically just have one sample, 
we observe only one realization of T n and therefore cannot estimate the distribu¬ 
tion of T n . If l = l n is a sequence with l n —> oo and l n = o(n), then T; also 
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converges in distribution to T and we have multiple (though dependent) realiza¬ 
tions T i (X 1 ,..., Xi), Ti(X 2 , ■ • ■, X t+ 1 ),. . T t (X n _i + 1 , ..., X n ), which can be used 
to calculate the empirical distribution function. 

Note that we do not need to know the limit distribution. In our example (self- 
normalized change point test statistic, see Section 3), the shape of the distribution 
depends on two unknown parameters, but we can still apply subsampling. However, 
for other statistics, one needs an unknown scaling to achieve convergence. If this is 
the case, one has to estimate the scaling parameters before applying subsampling. 

Under long range dependence the validity of subsampling for the sample mean 
X = - Xi has been investigated in the literature starting with Hall, Jing 
and Lahiri [29] for subordinated Gaussian processes. Nordman and Lahiri [43] and 
Zhang, Ho, Wendler and Wu [55] studied linear processes with slowly decaying 
coefficients. For the case of Gaussian processes an alternative proof can be found 
in the book of Beran, Feng, Ghosh and Kulik sa¬ 
lt was noted by Fan [2B] that the proof in [23| can be easily generalized to other 
statistics than the sample mean. However, the assumptions on the Gaussian process 
are restrictive (see also [42] )■ Their conditions imply that the sequence (£ n )neiN is 
completely regular, which might hold for some special cases (see Ibragimov and 
Rozanov 02] )j but excludes many examples: 

Example 1 (Fractional Gaussian Noise). Let (Bh (i))te[o,oo) be a fractional Brow¬ 
nian motion, i.e. a centered, self-similar Gaussian process with covariance function 

E [B H (t)B H (s)} = \ + | S | 2 " - |t - s\ 2H ) 

for some H £ (|, 1). Then, (£ n ) ne ]N given by = Bh{ti) — Bn{n — 1) is called 
fractional Gaussian noise. By self-similarity we have 

/ n 3 n \ 

corr 0 j = corr{B H (n),B H (3n) - B H (2n)) 

' i =1 j=2n-\-l ' 

= cotv(B h (1),B h (3)-B h (2)). 

As a result, the correlations of linear combinations of observations in the past and 
future do not vanish if the gap between past and future grows. Thus, fractional 
Gaussian noise is not completely regular. 

Jach, McElroy and Politis t 33i provided a more general result on the validity 
of subsampling. They assume that the function G has Hermite rank 1, that G is 
invertible and Lipschitz-continuous and that the process (£ ra ) n e]N has a causal rep¬ 
resentation as a functional of an independent sequence of random variables. These 
assumptions are difficult to check in practice. Moreover, although not explicitly 
stated in [33] . the statistic T n has to be Lipschitz-continuous (uniformly in n), 
which is not satisfied by many robust estimators (see Section [3] for an example). 

The main aim of this paper is to establish the validity of the subsampling method 
for general statistics T n without any assumptions on the continuity of the statistic, 
on the function G and only mild assumptions on the Gaussian process (£n) n eiN- 
Independently of our research, similar theorems have been proved by Bai, Taqqu 
and Zhang [6] . We will discuss their results after our main theorem in Section [2] 
In Section [3] we will apply our theorem to a self-normalized, robust change-point 
statistic. The finite sample properties of this test will be investigated in a simulation 
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study in Section [4j Finally, the proof of the main result and the lemmas needed 
can be found in Section [5] 


2. Main Results 


2.1. Statement of the Theorem. For a statistic T n = T n (X i,... ,X n ) the sub¬ 
sampling estimator Fi tU of the distribution function Ft„ with Fr„(t) = P{T n < t) 
is defined in the following way: For t £ R let 

^ n-l-l-1 

F ‘,n(t) = n _ l + 1 1 {Ti(X i ,...,X i+i _ 1 )<t}- 

i—1 

Our first assumption guarantees the convergence of the distribution function Fx n : 

Assumption 1. (X ra )„ e ]N is a stochastic process and (T„) ne k is a sequence of 
statistics such that T n => T in distribution as n —> oo for a random variable T with 
distribution function Ft ■ 

This is a standard assumption for subsampling, see for example [?5]- If the 
distribution does not converge, we cannot expect the distribution of Tj to be close 
to the distribution of T n . 

Next, we will formulate our conditions on the sequence of random variables 
(^fn)nClN : 


Assumption 2. X n = G(f n ) for a measurable function G and a stationary, Gauss¬ 
ian process (£ n )neiN with covariance function 

7(A) “ Cov(a,£i +k ) = k~ D L 1 (k) 


such that the following conditions hold: 

(1) D £ (0,1] and L 7 is a slowly varying function with 


max 

ke{k+l,...,k+2l'-l} 


Lj{k) 


Lj(k ) 


< K 


V . 

— mm 

k 


{Ly(k), 1} 


for a constant K < oo and all l' £ {Ik, • ■ ■, k}. 

(2) (£ra)neiN has a spectral density f with f{x ) = \x\ D ^ 1 Lf{x) for a slowly 
varying function Lf which is bounded away from 0 on [0,7r] such that 
linxr-vo Lf (x) £ (0, oo] exists. 


While we have some regularity conditions on the underlying Gaussian process 
(Cn)neiN) we do not impose any conditions on the function G: no finite moments 
or continuity are required, so that our results are applicable for heavy-tailed ran¬ 
dom variables and robust test statistics. In the next subsection we will show that 
Assumption [2] holds for some standard examples of long range dependent Gaussian 
processes. 

Furthermore, we need a restriction on the growth rate of the block length l: 


Assumption 3. Let (Z n )nei n he a non-decreasing sequence of integers such that 
l = l n —>00 as n —>00 and l n = 0(n^ 1+D ^ 2 ~ e { for some e > 0. 

If the dependence of the underlying process (£n) n eiN gets stronger, the range of 
possible values for l gets smaller. A popular choice for the block length is l « Cyfn 
(see for example [25j), which is allowed for all D £ (0,1]. Now, we can state our 
main result: 
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Theorem 1. Under Assumptions^^ and\^ we have 

F t „( t) - A 0 

as n —> oo /or all points of continuity t of Ft- If Ft is continuous, then 


sup 
te R 


Fr n {t) — Fi tTl (t) 


0 . 


As a result, we have a consistent estimator for the distribution function of T n . 
It is possible to build tests and confidence intervals based on this estimator. 

If D > 1, the process (£n)neM is strongly mixing due to Theorem 9.8 in the book 
of Bradley |18j . The statements of Theorem [I] hold by Corollary 3.2 in [45] for any 
block length l satisfying l —> oo and l = o(n). 

In a recent article, Bai et al. [5] have shown that subsampling is consistent 
for long range dependent Gaussian processes without any extra assumptions on 
the slowly varying function Lf , but with a stronger restriction on the block size l, 
namely l = o(n 2_2 ' ff L 7 (n)). In another article by Bai and Taqqu [3], the validity 
of subsampling is shown under the mildest possible assumption on the block length 
(l = o(n)). The condition on the spectral density is slightly stronger than our 
condition, the case lim x _>o Ff(x) = oo is not allowed. 


2.2. Examples for our Assumptions. We will now give two examples of Gauss¬ 
ian processes satisfying Assumption [2j 


Example 2 (Fractional Gaussian Noise). Fractional Gaussian Noise (^„) n6 n with 
Hurst parameter H as introduced in Example [l] has the covariance function 

7(jfc) = \(\k~ l\ 2H - 2\k\ 2H + \k + 1\ 2H ) = H(2H - 1) ( k~ D + /i(fc)fc“' D “ 1 ) 

for D = 2 — 2 H and a function h bounded by a constant M < oo. This can be 
easily seen by means of a Taylor expansion. Hence, L 1 (k) = H(2H — l)(l + h(k)/k) 
and for all k > k 


L 7 (fc)-L 7 (fc) < H(2H — 1) 


h(k) h(k ) 


M 1 
< H(2H — 1)— =: K-. 


This implies part 1 of Assumption [2j For the second part note that the spectral 
density / corresponding to fractional Gaussian noise is given by 


/(A) = C(H)(\ - cos(A)) |A + 2 /c7t| d_3 
= A d ~ 1 C(H) 


k— — oo 

D-in/ zj\ 1 ~ C0S (A) Efcl-oo |A + 2fc7r| 


D -3 


A 2 

see Sinai m■ The slowly varying function 


\D~ 3 


Lf{ A) = C(H) 


1 - cos(A) Efcl-oo |A + 2fc7r| 


D -3 


A 2 X D ~ 3 

is bounded away from 0 because this holds for the first factor (1 — cos(A))/A 2 and 


since 


Er=-oo IA + 2kn\ D ~ 3 > |A + 0tt\ D ~ 3 = x 


\D~ 3 


\ D ~ 3 
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Example 3 (Gaussian FARIMA processes). Let (£„)„ e z be Gaussian white noise 
with variance a 2 = Var(e 0 )- Then, for d £ (0,1/2), a FARIMA(0, d, 0) process 
(Cn)neiN is given by 

r U + d) 


*» = £ 


3=0 


r (j + i)r(d) 


According to Pipiras and Taqqu |3i] . Section 1.3, it has the specral density 

1 -D 


m^\i-<r iX r 2d = \M D ~ 1 ^ r 


|A| 


1 — e 


—iX I 


with D = 1 — 2d £ (0,1). As |1 — e * A | < A, part 2 of Assumption [ 2 ] holds. For 
part 1 we have by Corollary 1.3.4 of [44] that 

2 r(l-2d) r (k + d) 


7 (k) = a 2 


r(i-d)r(d)T{k-d + i)' 

Recall that by the Stirling formula T(x) = (^) 1 // 2 (^) a: (l+0(j: _1 )). Consequently, 


Using a Taylor expansion of ( k + d)( log(/c + d) — log(fc)) + (k — d + l)(log(fc) — 
log(fc — d + 1 )), it easily follows that 

7(fc) = k~ D L 7 (fc) 

with L 1 {k) = C + (D(l/k) for some constant C. Part 1 of Assumption [ 2 ] follows in 
the same way as in Example [2j 


It would be interesting to know, if the sampling window method is also consis¬ 
tent for long range dependent linear processes and general statistics without the 
assumption of Gaussianity. However, this seems to be a very difficult problem and 
is beyond the scope of this article. 


3. Applications 

3.1. Robust, Self-Normalized Change-Point Test. In this paper, the main 
motivation for considering subsampling procedures in order to approximate the 
distribution of test statistics consists in avoiding the choice of unknown parameters. 
As an example we will consider a self-normalized test statistic that can be applied 
to detect changes in the mean of long range dependent and heavy-tailed time series. 

Given observations X -[...., X n with A] = fii + G(£i) we are concerned with a 
decision on the change-point problem 

H : /i! = ... = fi n 


against 

A : nx = ... = /i fc ^ fi k+1 = ... = fj n for some k £ {1, ..., n — 1} . 

Under the hypothesis H we assume that the data generating process (X„) ngN is sta¬ 
tionary, while under the alternative A there is a change in location at an unknown 
point in time. This problem has been widely studied: Csorgo and Horvath 121] 
give an overview of parametric and non-parametric methods that can be applied in 
order to detect change-points in independent data. 
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Many commonly used testing procedures are based on Cusum (cumulative sum) 
test statistics, but when applied to data sets generated by long range dependent 
processes, these change-point tests often falsely reject the hypothesis of no change 
in the mean (see also Baek and Pipiras [3]). Furthermore, the performance of 
Cusum-like change-point tests is sensitive to outliers in the data. 

In contrast, testing procedures that are based on rank statistics have the advan¬ 
tage of not being sensitive to outliers in the data. Rank-based tests were introduced 
by Antoch, Huskova, Janie and Ledwina [5] for detecting changes in the distribu¬ 
tion function of independent random variables. Wilcoxon-type rank tests have 
been studied by Wang m in the presence of linear long memory time series and 
by Dehling, Rooch and Taqqu |22] for subordinated Gaussian sequences. 

Note that the normalization of the Wilcoxon change-point test statistic as pro¬ 
posed in |22| depends on the slowly varying function L 7 , the LRD parameter D 
and the Hermite rank r of the class of functions 1 {a";<x} — F( x ), x € R. Although 
many authors assume r = 1 and while there are well-tried methods to estimate D, 
estimating L 1 does not seem to be an easy task. For this reason, the Wilcoxon 
change-point test does not seem to be suitable for applications to real data. 

To avoid these issues, Betken m proposes an alternative normalization for the 
Wilcoxon change-point test. This normalization approach has originally been es¬ 
tablished by Lobato gO] for decision on the hypothesis that a short range dependent 
stochastic process is uncorrelated up to a lag of a certain order. In change-point 
analysis, the normalization has recently been applied to several test statistics: Shao 
and Zhang gS] define a self-normalized Kolmogorov-Smirnov test statistic that 
serves to identify changes in the mean of short range dependent time series. Shao 
1481 adopted the normalization so as to define an alternative normalization for a 
Cusum test which detects changes in the mean of short range dependent as well as 
long range dependent time series. 

For the definition of the self-normalized Wilcoxon test statistic, we introduce 
the ranks Ri := rank(Aj) = Xg=i ^{Xj<Xi} for i = 1,...,n. It seems natural to 
transfer the normalization that has been used in [48] to the Cusum test statistic 
of the ranks in order to establish a self-normalized version of the Wilcoxon test 
statistic, which is robust to outliers in the data. Therefore, the corresponding 
two-sample test statistic is defined by 



where 



The self-normalized Wilcoxon change-point test rejects the hypothesis for large 
values of maxfc e ri„ T1 i ___\ nT2 \\ |G n (/c)|, where 0 < ri < 72 < 1. The proportion of 
the data that is included in the calculation of the supremum is restricted by t± and 
72 . A common choice for these parameters is ri = 1 — 72 = 0.15; see Andrews (2j - 
For long range dependent subordinated Gaussian processes (X n ) ngN , the asymp¬ 
totic distribution of the test statistic under the hypothesis H can be derived by the 
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continuous mapping theorem (see Theorem 1 in ]12j): 


T n (Ti,T 2 ):= max |G„(fc)l 

[riT\\ [nr 2 J } 

^ su _ \Z r (X) - XZ r (l)\ _ 

tJaL {/ 0 A (Z r (t) - {Z r {X)Ydt + f 0 1 -\z* r (t) - ^Z*(l - X)) 2 dt} 1/2 ' 


Here, Z r is an r-th order Hermite process with Hurst parameter H := max{l — 
^, 5 } and Z£(r) = Z r (l) — Z r (l — t). A comparison of T n (n, t 2 ) with the critical 
values of its limit distribution still presupposes determination of these parameters. 
We can bypass the estimation of D and r by applying the subsampling procedure 
since, due to the convergence of T n (r\,T 2 ), Assumption [l] holds. 

Note that even under the alternative A (change in location), we have to find 
the quantiles of the distribution under the hypothesis (stationarity). As the block 
length l is much shorter than the sample size n, most blocks will not be contam¬ 
inated by the change-point so that the distribution of the test statistic will not 
change that much. The accuracy and the power of the test will be investigated by 
a simulation study in Section [4] 

If the distribution of A,; is not continuous, there might be ties in the data and 
consideration of the ranks Ri = £” =1 1{a,<x,} may not be appropriate. We pro¬ 
pose to use a modified statistic based on the modified ranks Ri = iO-{Xj<Xi} + 
^l{Xj=Xi}) in this case. For the convergence of the corresponding self-normalized 
change point test see Appendix |A| 

The test statistic T„(ti, t 2 ) is designed for the detection of a single change-point. 
An extension of the testing procedure that allows for multiple change-points is pos¬ 
sible by adapting Shao’s testing procedure which takes this problem into consider¬ 
ation (see [48]). For convenience, we describe the construction of the modified test 
statistic in the case of two change-points. The general idea consists in dividing the 
sample given by Xi ,..., X n according to the pair (Ay, k 2 ) of potential change-point 
locations and to compute the original test statistic with respect to the subsamples 
Xi ,..., Xk 2 and A/- 1 + i,..., X n . We reject the hypothesis for large values of the 
sum of the corresponding single statistics. 

For e e (0,T2 -ti) define T„(ti,t 2 ,£) := sup (fclife2)enn(riiT2>e) |G n (fci, k 2 )\, where 
fIn(Ti,T 2 ,£) := {(fci,fc 2 ) : L^tiJ < k\ < k 2 < [nr 2 \, k 2 - ki > [ne\} and 


G n {k\ 1 k 2 ) 


E ki pi 1 ) _ fci d(i) 

i =1 W k 2 Z^i=l n i 



1 

n Zit=ki+1 




1/2 


2 ptAJ _ fc 2 -fc 1 pi 

n—ki Z-^i=k i + l 


(2) _ fc 2 -fci 


>( 2 ) 


k l£* 1+ i (s t (2) (*i + 1, k 2 )) 2 + i £r= fc2+ i (sf >(*r + 1, n))‘ 


1/2 ’ 
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where 


?« - 


&2 


3 = 1 


Rf ] - = 

E 1 { X J < x i } ’ 


j-kx+1 

- *S!) 

with R^l := — 

3,k 


k 


k-j + 1 


t=j 


The distribution of the test statistic converges to a limit T(r, t 1 ,t 2 ,£) (see Appen¬ 
dix n> so subsampling can be applied. The critical values corresponding to the 
asymptotic distribution of the test statistic are reported in Table [l] 


Table 1. Simulated critical values for the distribution of 
T(l,ri,r 2 ,e) when [ti,t 2 ] = [0.15,0.85] and e = 0.15. The sample 
size is 1000, the number of replications is 10, 000. 




10 % 

5% 

1 % 

H = 

= 0.501 

17.79 

19.76 

24.13 

H = 

= 0.6 

19.80 

22.38 

27.68 

H = 

= 0.7 

22.08 

24.95 

30.46 

H = 

= 0.8 

24.24 

27.61 

34.04 

H = 

= 0.9 

26.50 

30.11 

37.78 

H = 

= 0.999 

28.28 

32.32 

41.24 


3.2. Data Examples. We will revisit some data sets which have been analyzed 
before in the literature. We will use the self-normalized Wilcoxon change-point test 
combined with subsampling and compare our findings to the conclusions of other 
authors. 

The plot in Figure [l] depicts the annual volume of discharge from the Nile river 
at Aswan in 10 8 m 3 for the years 1871 to 1970. The data set has been analyzed for 
the detection of a change-point by numerous authors under differing assumptions 
concerning the data generating random process and by usage of diverse methods. 
Amongst others, Cobb [19] , MacNeill, Tang and Jandhyala [41], Wu and Zhao Hz] 
and Shao [55] provided statistically significant evidence for a decrease of the Nile’s 
annual discharge towards the end of the 19tli century. The construction of the 
Aswan Low Dam between 1898 and 1902 serves as a popular explanation for an 
abrupt change in the data. 

The value of the self-normalized Wilcoxon test statistic computed with respect 
to the data is given by T , n (r 1 ,r 2 ) = 13.48729. For a level of significance of 5%, the 
self-normalized Wilcoxon change-point test rejects the hypothesis for every possible 
value of H £ (|,l). Furthermore, we approximate the distribution of the self- 
normalized Wilcoxon test statistic by the sampling window method with block size 
l = \y/n\ = 10. The subsampling-based test decision also indicates the existence 
of a change-point in the mean of the data, even if we consider the 99%-quantile of 

Fl,n- 

In particular, previous analysis of the Nile data done by Wu and Zhao m and 
Balke [7] suggests that the change in the discharge volume occurred in 1899. We 
applied the self-normalized Wilcoxon test and the sampling window method to the 
corresponding pre-break and post-break samples. Neither of these two approaches 
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O 



time 


Figure 1 . Measurements of the annual discharge of the river Nile 
at Aswan in 10 s m 3 for the years 1871-1970. The dotted line indi¬ 
cates the location of the change-point; the dashed lines designate 
the sample means for the pre-break and post-break samples. 


leads to rejection of the hypothesis, so that it seems reasonable to consider both 
samples as stationary. At this point, it is interesting to note that, based on the whole 
sample, local Whittle estimation with bandwidth parameter m = [n 2 ^ 3 J suggests 
the existence of long range dependence characterized by an Hurst parameter H = 
0.962, whereas the estimates for the pre-break and post-break samples given by 
H\ = 0.517 and H 2 = 0.5, respectively, should be considered as indication of 
short range dependent data. In this regard, our findings support the conjecture 
of spurious long memory caused by a change-point and therefore coincide with the 
results of Shao [38]. 

The second data set consists of the seasonally adjusted monthly deviations of 
the temperature (degrees C) for the northern hemisphere during the years 1854 to 
1989 from the monthly averages over the period 1950 to 1979. The data results 
from spatial averaging of temperatures measured over land and sea. At first sight, 
the plot in Figure [2] may suggest an increasing trend as well as an abrupt change of 
the temperature deviations. Statistical evidence for a positive deterministic trend 
implies affirmation of the conjecture that there has been global warming during the 
last decades. 

In scientific discourse, the question of whether the Northern hemisphere tempera¬ 
ture data acts as an indicator for global warming of the atmosphere is a controversial 
issue. Deo and Hurvich [23] provided some indication for global warming by fitting 
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Figure 2. Monthly temperature of the northern hemisphere for 
the years 1854-1989 from the data base held at the Climate Re¬ 
search Unit of the University of East Anglia, Norwich, England. 
The temperature anomalies (in degrees C) are calculated with re¬ 
spect to the reference period 1950-1979. The dotted line indicates 
the location of the potential change-point; the dashed lines desig¬ 
nate the sample means for the pre-break and post-break samples. 


a linear trend to the data. Beran and Feng [§] considered a more general stochas¬ 
tic model by the assumption of so-called semiparametric fractional autoregressive 
(SEMIFAR) processes. Their method did not deliver sufficient statistical evidence 
for a deterministic trend. Wang m applied another method for the detection of 
gradual change to the global temperature data and did not detect an increasing 
trend , either. Nonetheless, he offers an alternative explanation for the occurrence 
of a trend-like behavior by pointing out that it may have been generated by sta¬ 
tionary long range dependent processes. In contrast, it is shown in Shao @5] that 
the existence of a change-point in the mean yields yet another explanation for the 
performance of the data. 

The value of the self-normalized Wilcoxon test statistic computed with respect 
to the data is given by T n (r-|, 77) = 18.98636. Consequently, the self-normalized 
Wilcoxon change-point test would reject the hypothesis for every possible value of 
H £ (i, l) at a level of significance of 1%. In addition, an application of the sam¬ 
pling window method with respect to the self-normalized Wilcoxon test statistic 
based on comparison of T n {r\, T2) with the 99%-quantile of the sampling distribu¬ 
tion Fi n yields a test decision in favor of the alternative hypothesis for any choice 
of the block length l £ {|_rU|| 7 = 0.3,0.4,...,0.9} = {9,19,40,84,177,371,778}. 
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All in all, both testing procedures provide strong evidence for the existence of a 
change in the mean. 

According to Shao [5S] the change-point is located around October 1924. Based 
on the whole sample local Whittle estimation with bandwidth m = [ft 2 / 3 } provides 
an estimator H = 0.811. The estimated Hurst parameters for the pre-break and 
post-break sample are Hi = 0.597 and H 2 = 0.88, respectively. Neither of both 
testing procedures, i.e. subsampling with respect to the self-normalized Wilcoxon 
test statistic and comparison of the value of T„(ti, t 2 ) with the corresponding crit¬ 
ical values of its limit distribution, provides evidence for another change-point in 
the pre-break or post-break sample. 

Moreover, computation of the test statistic that allows for two change-point 
locations yields T„(ti,T 2 ,£) = 17.88404 (for n = 1 — r 2 = e = 0.15), i.e. if 
compared to the values in Table [l] the test statistic only surpasses the critical value 
corresponding to H = 0.501 and a significance level of 10%, but does not exceed any 
of the other values. Subsampling with respect to the test statistic T„(T 1 ,r 2 ,e) does 
not support the conjecture of two changes, either. In fact, subsampling leads to a 
rejection of the hypothesis when the block length equals l = [n°' 7 \ = 177 (based on 
a comparison of T u (ti,t 2 , e) with the 95%-quantile of the corresponding sampling 
distribution Fi >n ), but yields a test decision in favor of the hypothesis for block 
lengths l € { \ri 1 \ | 7 = 0.5,0.6, 0.8,0.9} = {40,84,371,778} and for comparison 
with the 90%-quantile of Fq n . 

Therefore, it seems safe to conclude that the appearance of long memory in 
the post-break sample is not caused by another change-point in the mean. The 
pronounced difference between the local Whittle estimators Hi and H 2 suggests a 
change in the dependence structure of the times series. Another explanation might 
be a gradual change of the temperature in the post-break period. We conjecture 
that our test has only low power in the case of a gradual change, because the de¬ 
nominator of our self-normalized test statistic is inflated as the ranks systematically 
deviate from the mean rank of the first and second part. When using subsampling, 
the trend also appears in subsamples so that we fail to approximate the distribution 
under the hypothesis. 

As pointed out by one of the referees, the Northern hemisphere temperature data 
does not seem to be second-order stationary; the variance in the first part of the 
time series seems to be higher. Such a change in variance should also result in a 
loss of power. The reason is that the ranks in the part with the higher variance are 
more extreme, so that the distance to the mean rank of this part is larger. This 
leads to a higher value of the denominator of our self-normalized test statistic and 
consequently to a lower value of the ratio. 

The third data set consists of the arrival rate of Ethernet data (bytes per 10 
milliseconds) from a local area network (LAN) measured at Bellcore Research and 
Engineering Center in 1989. For more information on the LAN traffic monitoring we 
refer to Leland and Wilson [35] and Beran 0. Figure[3]reveals that the observations 
are strongly right-skewed. As the self-normalized Wilcoxon test is based on ranks, 
we do not expect that this will affect our analysis. 

Coulon, Chabert and Swami [20] examined this data set for change-points before. 
The method proposed in their paper is based on the assumption that a FARIMA 
model holds for segments of the data. The number of different sections and the 


SUBSAMPLING UNDER LONG RANGE DEPENDENCE 


13 



Figure 3. Ethernet traffic in bytes per 10 milliseconds from a 
LAN measured at Bellcore Research Engineering Center. 


location of the change-points are chosen by a model selection criterion. The algo¬ 
rithm proposed by Coulon et al. [2D] detects multiple changes in the parameters of 
the corresponding FARIMA time series. 

In contrast, an application of the self-normalized Wilcoxon change-point test 
does not provide evidence for a change-point in the mean: the value of the test 
statistic is given by T„(ti,t 2 ) = 3.270726, i.e. even for a level of significance 
of 10%, the self-normalized Wilcoxon change-point test does not reject the hy¬ 
pothesis for any value H G (|,l)- Furthermore, subsampling with respect to the 
self-normalized Wilcoxon test statistic does not lead to a rejection of the hypoth¬ 
esis , either (for any choice of block length l G {}n 7 J| 7 = 0.3, 0.4,..., 0.9} = 
{12,27,63,144,332,761,1745} and for comparison with the 90%-quantile of the 
corresponding sampling distribution Fi :U ). 

Taking into consideration that the data set contains ties (the value 0 appears 
several times), we also applied the self-normalized Wilcoxon test statistic based on 
the modified ranks R z and used subsampling with respect to this statistic. Both 
approaches did not lead to a rejection of the hypothesis. 

An application of the test statistic constructed for the detection of two changes 
yields a value of T n (ri, t 2 , e) = 15.24527 when e = n = 1 — t 2 = 0.15. Clearly, this 
does not lead to a rejection of the hypothesis for any value of the parameter H. In 
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addition, subsampling based on comparison of T u (ti,T 2 , e) with the 90%-quantile 
of the corresponding sampling distribution Fi n does not provide evidence for the 
assertion of multiple changes for any block lenght l G { [n 1 J | 7 = 0.5, 0.6, 0.7,0.8} = 
{63,144, 332, 761} in the data, either. 

These results do not coincide with the analysis of the previous authors. On the 
one hand this may be due to the fact that the applied methods differ considerably 
from the testing procedures applied before. On the other hand, the change-point 
estimation algorithm proposed in Coulon, Chabert and Swami [20] is not robust 
to skewness or heavy-tailed distributions and decisively relies on the assumption 
of FARIMA time series. However, this seems to contradict observations made by 
Bhansali and Kokoszka m as well as Taqqu and Teverovsky [54] who stress that 
the model that fits the Ethernet traffic data is very unlikely to be FARIMA. 

Estimation of the Hurst parameter by the local Whittle procedure with band¬ 
width parameter m = |_n 2//3 J yields an estimate of H = 0.845 and therefore indi¬ 
cates the existence of long range dependence. This is consistent with the results of 
Leland et al. m and Taqqu and Teverovsky [541 . 

In the three data examples, we find that the results obtained by subsampling 
and by parameter estimation are in good accordance with each other. The methods 
take into account long range dependence or heavy tails, but still detect a change in 
location in the first two examples. For the third data example our analysis supports 
the hypothesis of stationarity. 


4. Simulations 


We will now investigate the finite sample performance of the subsampling pro¬ 
cedure with respect to the self-normalized Wilcoxon test and with respect to the 
classical Wilcoxon change-point test. Moreover, we will compare these results to the 
performance of the tests when the test decision is based on critical values obtained 
from the asymptotic distribution of the test statistic. 

For this purpose, we consider subordinated Gaussian time series (X n ) ngN , X n = 
G(£„), where (£n) neN is fractional Gaussian noise (introduced in Examples |T] and 
[2]) with Hurst parameter H G {0.6, 0.7,0.8, 0.9} and covariance function 



where D = 2—2 H. Initially, we take G(t ) = f, so that (AT n ) n6N has normal marginal 
distributions. We also consider the transformation 



(with 4> denoting the standard normal distribution function) so as to generate 
Pareto-distributed data with parameters k,l3 > 0 (referred to as Pareto(/3, k)). In 
both cases, the Hermite rank r of l{G(£i)<2:} — F(x),x G R, equals r = 1 and 



see [22 1 . 


Under the above conditions, the critical values of the asymptotic distribution of 
the self-normalized Wilcoxon test statistic are reported in Table 2 in [T2] . The limit 
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of the Wilcoxon change-point test statistic can be found in [22], the corresponding 
critical values can be taken from Table 1 in m 

The frequencies of rejections of both testing procedures are reported in Table 
[2] and Table [3] for the self-normalized Wilcoxon change-point test and in Table [4] 
and Table [5] for the classical Wilcoxon test (without self-normalization). The cal¬ 
culations are based on 5,000 realizations of time series with sample size n = 300 
and n = 500. We have chosen block lengths l = l n = |_n 7 J with 7 £ {0.4, 0.5,0.6}. 
As level of significance we chose 5%, i.e. we compare the values of the test statis¬ 
tic with the corresponding critical values of its asymptotic distribution and the 
corresponding quantile of the empirical distribution function respectively. 

For the usual testing procedures the estimation of the Hermite rank r, the slowly 
varying function L 1 and the integral f J\(x)dF(x ) is neglected. Yet, for every 
simulated time series we estimate the Hurst parameter H by the local Whittle 
estimator H proposed in Kiinsch [35j . This estimator is based on an approximation 
of the spectral density by the periodogram at the Fourier frequencies. It depends on 
the spectral bandwidth parameter m = m(n) which denotes the number of Fourier 
frequencies used for the estimation. If the bandwidth m satisfies — + — —> 
0 as n — > 00, the local Whittle estimator is a consistent estimator for H ; see 
Robinson [47]. For convenience we always choose m = |_n 2 / 3 J in this article. The 
critical values corresponding to the estimated values of H are determined by linear 
interpolation. 

Under the alternative A we analyze the power of the testing procedures (the 
frequency of rejection) by considering different choices for the height of the level 
shift (denoted by h) and the location [nr] of the change-point. In the tables the 
columns that are superscribed by u h = 0” correspond to the frequency of a type 1 
error, i.e. the rejection rate under the hypothesis H. 

For the self-normalized Wilcoxon change-point test (based on the asymptotic 
distribution), the empirical size almost equals the level of significance of 5% for 
normally distributed data (see Table [2]). The sampling window method yields re¬ 
jection rates that slightly exceed this level. For Pareto(3, 1) time series both testing 
procedures lead to similar results and tend to reject the hypothesis too often when 
there is no change. With regard to the empirical power, it is notable that for 
fractional Gaussian noise time series the sampling window method yields consider¬ 
ably better power than the test based on asymptotic critical values. If Pareto(3, 
l)-distributed time series are considered, the empirical power of the subsampling 
procedure is still better than the empirical power that results from using asymp¬ 
totic critical values. However, in this case, the deviation of the rejection rates is 
rather small. While the empirical size is not much affected by the Hurst parameter 
If, the empirical power is lower for H = 0.8, 0.9. 

Considering the classical Wilcoxon test (without self-normalization), it is notable 
that for both procedures the empirical size is in most cases not close to the nominal 
level of significance (5%), ranging from 1.1% to 20.8% using subsampling and from 
2.6% to 36.0% using asymptotic critical values. In general, the sampling window 
method becomes more conservative for higher values of the Hurst parameter if, 
while the test based on the asymptotic distribution becomes more liberal. Under the 
alternative, the usual application of the Wilcoxon test yields better power than the 
sampling window method, especially for high values of if. It should be emphasized 
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that this comparison is problematic because the rejection frequencies under the 
hypothesis differ. 

We conclude that the self-normalized Wilcoxon change-point test is more reliable 
than the classical change-point test. The reason might be that in the scaling of the 
classical test, the estimator H of the Hurst parameter enters as a power of the 
sample size n. Thus, a small error in this estimation might lead to a large error 
in the value of the test statistic. By using the sampling window method for the 
self-normalized version, we avoid the estimation of unknown parameters so that the 
performance is similar to the performance of the classical testing procedure which 
compares the values of the test statistic with the corresponding critical values. 

Note that in most cases covered by our simulations the choice of the block length 
for the subsampling procedure does not have a big impact on the frequency of a type 
1 error. Considering the self-normalized Wilcoxon change-point test, an increase of 
the block length tends to go along with a decrease in power, especially for big values 
of the Hurst parameter H and Pareto-distributed random variables. For smaller 
values of H the effect is not pronounced. We recommend using a block length [yi 0 ' 4 ] 
or |n°' 5 J for the self-normalized change-point test as the choice l = |_n°’ 6 J implies 
worse properties in most cases. 

An application of the subsampling testing procedure to the classical (non-self- 
normalized) Wilcoxon test for different choices of the block length shows the op¬ 
posite effect on the rejection rate under the alternative: an increase of the block 
length results in a higher frequency of rejections. Here, the block length [yi°’ 6 J 
leads to better results in many cases. However, we recommend to not use this test, 
but to self-normalize the test statistic instead. 

An alternative way of choosing the block length would be to apply the data- 
driven block selection rule proposed by Gotze and Rackauskas m and Bickel and 
Sakov [15] . Although the algorithm had originally been implemented for applica¬ 
tions of the m-out-of-n bootstrap to independent and identically distributed data, it 
also lead to satisfactory simulation results in applications to long range dependent 
time series (see [ 33 ]). Another general approach to the selection of the block size 
in the context of hypothesis testing is given by Algorithm 9.4.2 in Politis, Romano 
and Wolf [35] . 


Table 2. Rejection rates of the self-normalized Wilcoxon change-point test obtained by subsampling (left) with block 
length l = \n 1 \, 7 € {0.4,0.5, 0.6}, and by comparison with asymptotic critical values (right) for fractional Gaussian 
noise of length n with Hurst parameter H. 

sampling window method asymptotic distribution 


fGn 

H = 0.6 


H = 0.7 


H = 0.8 


H = 0.9 


n 

l 

O 

II 

r = 

0.25 

r = 

0.5 

O 

II 

r = 

0.25 

r = 

0.5 

h = 0.5 

h = 1 

h = 0.5 

h = 1 

h = 0.5 

h = 1 

h = 0.5 

h = 1 

300 

9 

0.041 

0.263 

0.700 

0.502 

0.952 







17 

0.064 

0.313 

0.742 

0.570 

0.964 

0.044 

0.209 

0.521 

0.424 

0.861 


30 

0.070 

0.322 

0.705 

0.555 

0.943 






500 

12 

0.053 

0.396 

0.859 

0.697 

0.994 







22 

0.060 

0.421 

0.861 

0.720 

0.995 

0.049 

0.303 

0.687 

0.577 

0.958 


41 

0.069 

0.411 

0.829 

0.697 

0.991 






300 

9 

0.057 

0.155 

0.412 

0.291 

0.759 







17 

0.070 

0.171 

0.423 

0.313 

0.763 

0.053 

0.108 

0.268 

0.228 

0.611 


30 

0.077 

0.177 

0.403 

0.314 

0.737 






500 

12 

0.056 

0.183 

0.513 

0.382 

0.856 







22 

0.059 

0.193 

0.508 

0.382 

0.854 

0.048 

0.133 

0.359 

0.302 

0.730 


41 

0.065 

0.192 

0.476 

0.387 

0.819 






300 

9 

0.070 

0.126 

0.251 

0.223 

0.526 







17 

0.067 

0.117 

0.234 

0.208 

0.494 

0.048 

0.081 

0.144 

0.141 

0.362 


30 

0.073 

0.114 

0.218 

0.201 

0.466 






500 

12 

0.066 

0.121 

0.295 

0.217 

0.591 







22 

0.068 

0.114 

0.278 

0.210 

0.567 

0.053 

0.085 

0.198 

0.163 

0.462 


41 

0.069 

0.119 

0.257 

0.205 

0.532 






300 

9 

0.093 

0.126 

0.208 

0.209 

0.462 







17 

0.074 

0.097 

0.161 

0.169 

0.397 

0.057 

0.065 

0.106 

0.125 

0.308 


30 

0.073 

0.095 

0.145 

0.165 

0.367 






500 

12 

0.079 

0.105 

0.194 

0.185 

0.461 







22 

0.067 

0.091 

0.166 

0.162 

0.416 

0.051 

0.068 

0.120 

0.128 

0.350 


41 

0.063 

0.087 

0.146 

0.152 

0.391 
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Table 3. Rejection rates of the self-normalized Wilcoxon change-point test obtained by subsampling (left) with £ 

block length l = [n 1 J, 7 € {0.4, 0.5,0.6}, and by comparison with asymptotic critical values (right) for Pareto(3, 


l)-transformed fractional Gaussian noise of length n 

sampling window method 

with Hurst parameter H. 

asymptotic distribution 



Pareto(3, 1) 

n 

l 


t = 0.25 

T = 

0.5 

h = 0 

r = 

0.25 

r = 

0.5 

h = 0 

h = 0.5 

h = 1 

h = 0.5 

h = 1 

h = 0.5 

h = 1 

h = 0.5 

h = 1 

H = 0.6 

300 

9 

0.041 

0.847 

0.977 

0.990 

1.000 








17 

0.067 

0.871 

0.946 

0.990 

1.000 

0.056 

0.820 

0.912 

0.984 

0.999 



30 

0.070 

0.831 

0.946 

0.979 

1.000 







500 

12 

0.055 

0.947 

0.997 

0.999 

1.000 








22 

0.066 

0.946 

0.994 

0.999 

1.000 

0.061 

0.920 

0.970 

0.996 

1.000 



41 

0.071 

0.921 

0.976 

0.996 

1.000 






H = 0.7 

300 

9 

0.057 

0.571 

0.821 

0.990 

0.994 








17 

0.064 

0.527 

0.738 

0.876 

0.990 

0.070 

0.529 

0.702 

0.856 

0.982 



30 

0.077 

0.527 

0.738 

0.842 

0.975 







500 

12 

0.066 

0.693 

0.904 

0.949 

0.999 








22 

0.068 

0.684 

0.893 

0.942 

0.998 

0.076 

0.663 

0.820 

0.940 

0.995 



41 

0.072 

0.632 

0.838 

0.921 

0.994 






H = 0.8 

300 

9 

0.070 

0.355 

0.574 

0.703 

0.931 








17 

0.068 

0.284 

0.454 

0.666 

0.905 

0.072 

0.297 

0.428 

0.640 

0.875 



30 

0.073 

0.284 

0.454 

0.633 

0.857 







500 

12 

0.064 

0.401 

0.609 

0.738 

0.948 








22 

0.063 

0.379 

0.581 

0.714 

0.933 

0.069 

0.369 

0.510 

0.715 

0.920 



41 

0.064 

0.345 

0.509 

0.688 

0.903 






H = 0.9 

300 

9 

0.093 

0.253 

0.396 

0.597 

0.832 








17 

0.071 

0.168 

0.254 

0.532 

0.772 

0.073 

0.165 

0.236 

0.499 

0.738 



30 

0.073 

0.168 

0.254 

0.482 

0.729 







500 

12 

0.073 

0.256 

0.405 

0.585 

0.839 








22 

0.064 

0.219 

0.340 

0.547 

0.802 

0.068 

0.199 

0.296 

0.529 

0.782 



41 

0.065 

0.190 

0.296 

0.503 

0.762 
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Table 4. Rejection rates of the classical Wilcoxon change-point test obtained by subsampling (left) with block length 
l = |_n 7 J, 7 G {0.4, 0.5, 0.6}, and by comparison with asymptotic critical values (right) for fractional Gaussian noise of 
length n with Hurst parameter H. 

sampling window method asymptotic distribution 


fGn 

H = 0.6 


H = 0.7 


H = 0.8 


H = 0.9 


n 

l 

O 

II 

r = 

0.25 
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0.5 
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II 

r = 
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0.5 

h = 0.5 

h = 1 

h = 0.5 
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h = 1 
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h = 1 

300 

9 

0.066 

0.20 

0.232 

0.386 

0.591 







17 

0.054 

0.223 

0.411 

0.439 

0.784 

0.026 

0.096 

0.160 

0.223 

0.727 


30 

0.059 

0.264 

0.529 

0.663 

0.870 






500 

12 

0.063 

0.285 

0.436 

0.569 

0.856 







22 

0.058 

0.345 

0.663 

0.627 

0.952 

0.036 

0.148 

0.256 

0.378 

0.897 


41 

0.062 

0.397 

0.789 

0.683 

0.975 






300 

9 

0.052 

0.080 

0.088 

0.162 

0.302 







17 

0.049 

0.095 

0.158 

0.206 

0.466 

0.035 

0.067 

0.228 

0.167 

0.665 


30 

0.051 

0.120 

0.227 

0.267 

0.593 






500 

12 

0.042 

0.104 

0.153 

0.249 

0.539 







22 

0.039 

0.131 

0.267 

0.287 

0.689 

0.030 

0.079 

0.259 

0.225 

0.714 


41 

0.046 

0.160 

0.373 

0.343 

0.789 






300 

9 

0.028 

0.030 

0.031 

0.054 

0.092 







17 

0.029 

0.038 

0.048 

0.075 

0.179 

0.077 

0.153 

0.421 

0.245 

0.673 


30 

0.034 

0.057 

0.088 

0.070 

0.272 






500 

12 

0.023 

0.031 

0.036 

0.064 

0.162 







22 

0.028 

0.044 

0.070 

0.097 

0.273 

0.050 

0.112 

0.439 

0.226 

0.714 


41 

0.039 

0.071 

0.129 

0.137 

0.391 






300 

9 

0.009 

0.010 

0.006 

0.016 

0.020 







17 

0.009 

0.014 

0.009 

0.021 

0.060 

0.36 

0.484 

0.739 

0.524 

0.830 


30 

0.015 

0.029 

0.028 

0.011 

0.153 






500 

12 

0.008 

0.006 

0.003 

0.015 

0.026 







22 

0.011 

0.009 

0.011 

0.029 

0.086 

0.319 

0.439 

0.743 

0.511 

0.845 


41 

0.021 

0.021 

0.032 

0.058 

0.197 







U1 

C 

tfl 

w 

> 

g 

“d 

£ 

§ 

Q 

d 

Z 

D 

H 


f 

O 

z 

o 

•?> 

> 

z 

a 

K 

D 

H 

“d 

H 

Z 

D 

H 

Z 

o 

H 



Table 5. Rejection rates of the classical Wilcoxon change-point test obtained by subsampling (left) with block length 
l = {rri'J, 7 £ {0.4, 0.5, 0.6}, and by comparison with asymptotic critical values (right) for Pareto(3, 1 {-transformed 
fractional Gaussian noise of length n with Hurst parameter H. 


to 

o 


Pareto(3, 1) 

n 

sampling window method 




asymptotic distribution 



l 

h = 0 

r = 0.25 

r = 

0.5 

h = 0 

r = 

0.25 

r = 

0.5 

h = 0.5 

h = 1 

h = 0.5 

h = 1 

h = 0.5 

h= 1 

h = 0.5 

h = 1 

H 

= 0.6 

300 

9 

0.170 

0.949 

0.742 

0.991 

0.923 









17 

0.130 

0.963 

0.861 

0.996 

0.991 

0.108 

0.938 

0.985 

0.998 

1.000 




30 

0.109 

0.962 

0.871 

0.998 

0.998 








500 

12 

0.163 

0.991 

0.916 

1.000 

0.993 









22 

0.132 

0.997 

0.976 

1.000 

0.999 

0.128 

0.988 

0.999 

1.000 

1.000 




41 

0.114 

0.997 

0.989 

1.000 

1.000 






H 

= 0.7 

300 

9 

0.224 

0.785 

0.568 

0.939 

0.796 









17 

0.175 

0.802 

0.680 

0.955 

0.949 

0.179 

0.833 

0.969 

0.974 

0.999 




30 

0.140 

0.789 

0.708 

0.959 

0.976 








500 

12 

0.208 

0.921 

0.763 

0.989 

0.956 









22 

0.167 

0.931 

0.862 

0.992 

0.996 

0.191 

0.940 

0.994 

0.996 

1.000 




41 

0.143 

0.925 

0.891 

0.994 

0.998 






H 

= 0.8 

300 

9 

0.203 

0.508 

0.326 

0.743 

0.565 









17 

0.160 

0.496 

0.347 

0.776 

0.808 

0.204 

0.729 

0.925 

0.918 

0.993 




30 

0.137 

0.484 

0.364 

0.791 

0.881 








500 

12 

0.190 

0.639 

0.445 

0.865 

0.770 









22 

0.160 

0.649 

0.513 

0.886 

0.929 

0.212 

0.805 

0.963 

0.948 

0.999 




41 

0.137 

0.626 

0.556 

0.890 

0.961 






H 

= 0.9 

300 

9 

0.128 

0.150 

0.077 

0.320 

0.336 









17 

0.097 

0.128 

0.071 

0.403 

0.550 

0.309 

0.712 

0.901 

0.848 

0.966 




30 

0.092 

0.125 

0.077 

0.481 

0.677 








500 

12 

0.112 

0.159 

0.089 

0.402 

0.436 









22 

0.100 

0.161 

0.101 

0.518 

0.680 

0.27 

0.726 

0.911 

0.851 

0.975 




41 

0.095 

0.170 

0.106 

0.571 

0.771 
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5. Proofs 

5.1. Auxiliary Results. 

Lemma 1 . Under Assumption [1J there is a constant Kjj < oo, such that for all 
Xi ,..., Xi G R with Var(X^ =1 %i€i) = 1 


- Kd ' 

2=1 

Proof. Recall that we can rewrite the covariances as 

7 (*)= e ikX f(\)d\ 

J — 7T 

and that the spectral density / can be written as /(A) = L/(|A|)|A| D_1 . By our 
assumptions Lf{x) > C m i n for a constant C m i n > 0, so that we can conclude that 


t 

1 =Varf = y, XjXkjij-k) 

*=i i <j,k<i 

= y Xj x k T e^-^ x f(X)dX= y r e^-^LfdXDlX^-'dX 

-<j,k<l ^ _7r 1 ^ 

«7T />7T ^ 2 

/ y x J a; fc e^- fe)A L / (A)A £, - 1 dA = 2 / y^e“ ijA L f {X)X D ~ 1 dX 


l<j,fe<Z 
=2 


l<7,fc<Z 


J=1 


l 

y 

i=i 

We rewrite the integrand as 


>261, 


D-l 


—*j‘A 


2 

dA. 


Z 2 

-ijX 


E 

i=i 




= y jy.Tfc e e'"'"' = X/; e 

l<j,k<l 1=1 l^fe 


-i(j-k) A 


L 

E x i+E * 1 ** ( e 


-i(l-fe)A _j_ i(fc—1)A 


1=1 l<fc 
Z 

= y + 2 y ZjZfc cos((/c — j)A) = y XjXkCos((k — j)X). 

j—1 j<k 1 <j,k<l 


As a result, we have 


E- 

1=1 


a -»l* 


dA = 


/ XjX k cos((k — j)X)dX 

Jo 


i <j,k<l 


= y XjX k / cos((/c — j)X)dX 

1 <j,k<l , ' 0 
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All in all, this yields 


^ /* 7T /»7T 

= Y x i / cos (°) dA +Y XjXk / cos((fc — j)X)d\ 
0=1 Jo Jo 


= ^Y x l 

0=1 


1 = Var ( ^2 Xi£i) > 2C„ 


D-l 


^=1 


l 2 

»,<' ,yA 

j=i 


i 


Therefore, the statement of the lemma holds with Kp = 1/(2C m \ n Tt D ). 


dX = 2C m j n 7r D Xj. 

o=i 
D\ 


□ 


Lemma 2. 

that 


Under Assumption [1| there are constants K' D < oo and Iq £ IN such 


Y- 


< K' D l D/2 


for all l > Iq and X\,,xi £ 1R, with Var 


(eL 


= i. 


Proof. The statement of the proof is equivalent to the existence of a constant C > 0, 
such that for all aq,..., xi £ 1R with Yi -i x i = 1; we have 

Var (X^&) >Cr D . 


Let x*,...,x* £ 1R with El!: = i a:* = 1 be the values that minimize Var ^E!:=i x i&j ■ 

Then .... ,f n ) := Yi=i x t£i is the best linear unbiased estimator for g := 
E(£ i). For a process (Cn)neiN with spectral density 


fd x ) 


1 

2tt 


1 — e* : 


D— 1 


we have 

Var (/if (Cl) • • •, CO) > C\l~ D 


for a constant Ci > 0 by a Corollary of Adenstedt [Tj (see p. 1101). We rewrite the 
spectral density of (Cn)neiN with the help of the spectral density / of (CEeM as 


fd x ) = / 0); 


1 - e l 


I D—l 


2tt\x\ d x Lf(x) 


h_ e tx l 

Note that the function g with g(x) = 2 i\ x \ D ~^L f (x) bounded, as we assumed that 

Lf is bounded away from 0. Hence, we have 

Var(/x 5 (Ci,...,Cn)) > ^yVar(/i c (Ci,...,Cn)) > Cl~ D 

for all l > Iq by Lemma 4.4 in [T]. □ 
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The next Lemma deals with the p-mixing coefficient, which is defined in the 
following way: Let A. B be two cr-fields. Then 

p(A,B) := supcorr(A', Y), 

where the supremum is taken over all „4-measurable random variables X and all 
£>-measurable random variables Y. For details we recommend the book of Bradley 

[Ill- 

Lemma 3. Under Assumption^ there are constants C\,C 2 < oo such that 

p(k,l) := p(cr(£i, 1 < * < l),a(£j,k + l + 1 < j < k + 21)) 

< Cl (k/l)~ D L-y{k) + C 2 l 2 k~ D - 1 max{L 7 (fc), 1} 

for all k G IN and all l G {Ik, ■ ■ ■ ,k}. 

Proof. Kolmogorov and Rozanov [5J proved that there exist real numbers ai, a 2 , ■ ■ ■, ai, 
bi, b 2 , ..., bi such that 

i i 

p(v{£i, 1 < * < l),a{£j,k + l + 1 < j < k + 21)) = Cov( Y] a^j, Y] bj£ k +i+j) 

i =i f = i 

and Var( Y^i=i a i£i) = Var( X]j=i b j£k+l+j) = 1- The triangular inequality yields 


Cov 


L L 

f a i£i, J]] hjfk+l+: 


»=1 3 =1 


l l l l 

- I ai H \ a i\\ b o\ l 7 ( fc ) — l(k + l+j — i)\. 

1=1 j =1 1=1 1=1 

We will treat the two summands on the right hand side separately. For the first 
term, it follows by Lemma [2] that 


ii ii 

\^2 a i^2 b j h( k )\ = | X! a *|| J2 b t h(k)\<Kfl D L 7 {k)k 

2 = 1 j—1 2=1 j =1 


— D 


Before we deal with the second summand, we observe that by Holder’s inequality 
and Lemma [T] 


W - 


2=1 


\ 


< yjKpVi and ^ \bj\ < 
3 =i 


2=1 


\ 


lJ2 b j < y/K^y/i- 


i=i 


Due to Assumption [2] 


L 1 {k) 


L^{k) 


< K 


l 

k 


for some constant K. 


sup 

|fc-fc|<2Z-l 
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Consequently, for all k £ {k + 1,..., k + 21 — 1} 


7(fc) - 7 (fc) < i 7 (fc) 


k~ D - k~ D 

-D 


| L 1 {k) — L 1 (k)\k 


-D 


-D 


< L 7 (fc) ( k~ D -(k + 2l- 1 )~ D ) + \L 1 {k) - L 7 (fc)|jfe 

< Cdk^ D ^ 1 lLr y (k) + Kyk~ D max{L 7 (fc), 1 } 

K, 

< Czk^ D ^ l l max{L 7 (fc), 1 } 

for some constants Cd, C 3 . Combining this with the bounds for Yl\=i l a *l’ Yhj =1 l&j I; 
we finally arrive at 


1 1 

| 6 j| | 7 (fc) - 7 (fc + l + j — i)| < I\ D l max 

*=1 3 =1 


7(fc) - 7(fe) 


fce{fc+i,...,fc+2i-i} 

= KjjC^k^ 0 ^ 1 ! 2 max{L 7 (/c), 1 }. 


□ 


5.2. Proof of the Main Result. Let 1 be a point of continuity of Ft- In order 
to simplify notation, we write ./V = n — l + 1 and = XJ(Aj, . .. ,X i+ ;_ 1 ). The 
triangular inequality yields 

I Fi,n(t) — Fr n (t )| < | Fi tn (t) - F T (t )| + | F T (t) - F Tn {t)\. 

The second term on the right-hand side of the above inequality converges to zero 
because of Assumption [l] As Z^-convergence implies stochastic convergence, it 
suffices to show that 

E - F T (*)| 2 ) —* 0 

in order to prove that the first term converges to zero, as well. We have 
E (|Ti,n(t) — Ft(^)| 2 ) 

= E (F 2 n {t)) - (EF,, n (t )) 2 + (F T it)f - 2F T (t) E £,,„(*) + (e F hn {t) 

= Var(F,,„(t)) + |E F hn (t) - F T {t) f . 

Furthermore, stationarity of the process (X n ) nGN and Assumption |T| imply 


N 


e £,»(*) = =Pm, 1 <t) = f Ti ( t) ^ F T (t). 


i= 1 


It remains to show that Var(F; „(t)) —> 0. Again, it follows by stationarity of 
1 2 N 

Var(A n (f)) = — Var (l {Tj l < t} ) + -^^(AT-j+ l)Cov(l {T ,,»<*}, 1 { t m <* } ) 

i=2 

2 N 

- l Cov ( 1 {X,i<tT 1 {T ! ,,<t})| ■ 


2 = 1 
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Recall that by Assumption [ 3 J we have l < Cin < ' 1+D ^ 2 ~ e for some constants Ci and 
e > 0. For n large enough such that l < \ |_ nl-e,/2 J • we split the sum of covariances 
into two parts: 


1 

N 


N 

E | Cov 

2=1 


^ L ™ 1 e/2 J N 

= N E | Cov ( 1 {T i ,i<t}, 1 {T !|i <t})| +-^: E l Cov 

2=1 2 =Ln 1-e / 2 J+l 

\ n X-e/1\ 1 N 

< —+ n E P(v(Xi,l<i<l),(r(Xj,k<j<k + l-i)) 

k=\n 1 ~ e / 2 J +1 


< 


[n 1 e / 2 J 
N 


N-l-l 

N ^2 P (*>*), 

k— [n 1_€ / 2 J —l 


where 


p(k , l) := p(a(Xi, 1 < i < l), <r(Xj, k + l + l<j<k + 21 )). 

Obviously, the first summand converges to zero by Assumption [3] For the second 
summand note that as a consequence of Potter’s Theorem (Theorem 1.5.6 in the 
book of Bingham, Goldie and Teugels im there is a constant Cl such that L 7 (k) < 
CLk De / 2 for all k £ N. This together with Lemma [ 3 ] yields 

1 N-l-l 

N E p(M) 

k— [n 1 ~ e / 2 ] —l 

jD N — l —1 ,2 N — l —1 

<C L C^ E k- D k D ^ + C L C 2 l - E k~ D ~'k D ^ 

k— [n 1 —C / 2 J /2 fc=L« 1_c/2 J/ 2 

< C'z,C'iGf’2 D(1_e/2) n D(((1+£ ’ )/2_e)_(1_e/2)+e/2(1_e/2)) 

+ C L C 2 Gf 2 1+D(1_e/2) n ((1+D_2e)_(D+1)(1_e/2)+(1_e/2)De/2) 

< C (n- D ( (1 - D) / 2+e2/4 ) + n -df-o+r>6/4) j 0 

for some constant C < 00 . Thus, we have proved that Var(A n (t)) —>• 0 as n —> 00 
and that the first conjecture of Theorem [l] holds. 

The second assertion of Theorem [T] follows from 

F Tn (t) - F l>n (t) A 0 

by the usual Glivenko-Cantelli argument for the uniform convergence of empirical 
distribution functions; see for example section 20 in the book of Billingsley [Mj. □ 
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Appendix A. A Modified Change Point Test for Data with Ties 


If the distribution of X, = G(£i) is not continuous, there is a positive probability 
that Xi = Xj for some i ^ j, so there might be ties in the sample. We propose to 
use the following test statistic based on the modified ranks R t = E^-i (l{x, <x,} + 


T„{t 1 ,t 2 ) := 


max 

ke{[riTi\ ,...,Lnr 2 J} 


E k td _ k ST^ n 6 

i=l lx i n Z^i =1 lx i 


{ X Eh S?(l, k) + i Et= k+ 1 S?(k + !,»)} 


1/2 ’ 


where 

t k 

h—j ^ ^ i=j ' 

To be able to apply subsampling, we need T n to converge in distribution, which we 
will show now: 


Lemma 4. Let (£ ra )neiN be a stationary sequence of centered standard Gaussian 
variables with covariance function j{k) = k~ D L^(k) for a D £ (0,1) and a slowly 
varying function L 1 . Let Xi = G(fi) for a function G, piecewise monotone on 
finitely many pieces. Then T n (ri,T 2 ) => T for some random variable T. 

Proof. Let h(x,y) = 1{g(*)<G(»)} + \ l {G(x)=G(y)} ~ We define the modified 
Wilcoxon process (Wn(A)).\ e [o,i] by 


j=[n\]+l 

with d n = Y / Var(^]” =1 £*). From Theorem 2.2 in Dehling, Rooch, Wendler |23j, we 
have the weak convergence of this process W n to the limit process W with 


E 


W n ( A):= 


nd r 


[nA] 

E 

i— 1 


W( A) 

= -(! - A)Z(A) J ip(x)dh(x) - \{Z{\) - Z( A)) J j tp{y)dh(x,y)(yj^(p(x)dx. 

Here, Z is a fractional Brownian motion, tp is the density function of the standard 
normal distribution and h(x ) = E[h(x, £,)]. Following the proof of Theorem 1 in 
m, we can express T , n (r 1 ,r 2 ) as a function of W n : 

T , n (ri,T 2 ) 

Wn( A) 

“ nSL { J 0 x (W n (t) - ^fjjjW n (\)) 2 dt + fl{W n (t) - l^W n (Wdt} 1/2 ' 
Note that c n ( A) converges to A uniformly, so we have the asymptotic equivalence 


Tn(n,T 2 ) 


W n ( A) 

sup -ru- 1 -ru-z-T7T- 

{ So(Wn(t) - {W n (\)) 2 dt + £(W n (t) - EjWniWdt} 1 
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By the continuous mapping theorem, we get 


T , „(ti,t 2 ) 


sup , , , /9 

n<X<T2 {JoVCO - {Wix^dt + Jliwit) - tiw(\)) 2 dt} 1/2 


|W(A)| 


=: T. 


□ 


Appendix B. A Test for Multiple Change Points 


For testing the alternative hypothesis of two change points, we suggest to use 
the test statistic T„(ti,t 2 , e) = sup^^gfw^^^) \G n (k\, k 2 )\. Some calculations 
yield 


G n (ki,k 2 ) 


W n ( Ai, A 2 ) 


A 2 


+ 


/ ( W n (r , A 2 )- f-W n (X 1 , X 2 )Ydr + f ( W n (r, X 2 )~ g=f-W n (\ lt \ 2 )) dr 

0 Ai 

_ |W-*(A 2 , API _ 

j\w*( r Ai)-&W*(X 2 ,X 1 )) 2 dr+ 1 f{W*(r,X 1 )-^-W*(X 2 ,Xi)) 2 dr 

Ai A 2 


+ op(l)) 


where 


Wn(A,r) := W n (X, A) — W n (X, r), W* (A, r) 


W„(A,A)-W„(t,A) 


with 


[nAJ 


w n (\,T) = y; y ( 1 ^^} - 2 ) > o<a<t<i. 

2=1 i = LnrJ + l 


Define 


dl := Var ( ^] , 


where Ff r denotes the r-th order Hermite polynomial and r designates the Hermite 
rank of the class of functions {l{G(&)<x} — F( x )> a; € R}. It can be shown that 
ri ^-W„(A, r) converges in distribution to 

{(1 — r)Z r ( A) — \{Z r (l) — Z r (r))} -j- J J r (x)dF(x), 0 < A < r < 1, 

where Z r is an r-th order Hermite process with Hurst parameter H := max{l — 
r |^, and where 


Jr(x ) — E (-HV(£i)l{G(£i)<x}) • 
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As a result, under the hypothesis the limiting distribution of T n (ri, 72 , e) is given 
by T(r,Ti,r 2 ,e) = sup Ti < Ai<A2 < T2> Aa _ Al > e G r (Ai,A 2 ) with 

G r (A 1 ,A 2 ) 


Zr (AO - £Zr(A 2 ) 


/ 0 Al (Z r (t) - ^Z r (A 1 )) 2 dt + /^(Z r (t) - £^Z r (A 2 ) - A^Z r (Ai)) 2 di 


/(Z,.(t) - ^^(Ar) - ^Z r (X 2 )) 2 dt + f (Z r (t ) - ££Z r (A 2 ) - 


Ruhr-Universitat Bochum, Germany 
E-mail address : annika.betken@rub.de 

Ernst-Moritz-Arndt-Universitat Greifswald, Germany 
E-mail address : martin.wendler@uni-greifswald.de 



